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Abstract 

In two-player zero-sum games, if both players minimize their average external 
regret, then the average of the strategy profiles converges to a Nash equilib- 
rium. For n-player general-sum games, however, theoretical guarantees for 
regret minimization are less understood. Nonetheless, Counterfactual Regret 
Minimization (CFR), a popular regret minimization algorithm for extensive- 
form games, has generated winning three-player Texas Hold'em agents in 
the Annual Computer Poker Competition (ACPC). In this paper, we pro- 
vide the first set of theoretical properties for regret minimization algorithms 
in non-zero-sum games by proving that solutions eliminate iterative strict 
domination. We formally define dominated actions in extensive-form games, 
show that CFR avoids iteratively strictly dominated actions and strategies, 
and demonstrate that removing iteratively dominated actions is enough to 
win a mock tournament in a small poker game. In addition, for two-player 
non-zero-sum games, we bound the worst case performance and show that 
in practice, regret minimization can yield strategies very close to equilib- 
rium. Our theoretical advancements lead us to a new modification of CFR 
for games with more than two players that is more efficient and may be used 
to generate stronger strategies than previously possible. Furthermore, we 
present a new three-player Texas Hold'em poker agent that was built using 
CFR and a novel game decomposition method. Our new agent wins the 
three-player events of the 2012 ACPC and defeats the winning three-player 
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programs from previous competitions while requiring less resources to gen- 
erate than the 2011 winner. Finally, we show that our CFR modification 
computes a strategy of equal quality to our new agent in a quarter of the 
time of standard CFR using half the memory. 
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1. Introduction 

Normal-form games are a common and general framework useful for mod- 
elling problems involving single, simultaneous decisions made by multiple 
agents. When decisions are sequential and involve imperfect information or 
stochastic events, extensive-form games are generally more practical. 

A common solution concept in games is a Nash equilibrium strategy pro- 
file that guarantees no player can gain utility by unilaterally deviating from 
the profile. For two-player zero-sum games, a Nash equilibrium is a powerful 
notion. In such domains, every Nash equilibrium profile results in the players 
earning their unique game value, and playing a strategy belonging to a Nash 
equilibrium guarantees a payoff no worse than the game value. In n-player 
general-sum games, these strong guarantees are lost. Each Nash equilibrium 
may provide different payoffs to the players and no guarantee can be made 
when more than one player deviates from a specific equilibrium profile. Re- 
gardless, no practical algorithms are known for computing an equilibrium in 
even moderately-sized games with more than two players. 

Counterfactual Regret Minimization (CFR) [1] is a state-of-the-art al- 
gorithm for approximating Nash equilbria of large two-player zero-sum 
extensive-form games. CFR is an iterative, off-line regret minimizer that 
stores two strategy profiles, the current profile that is being played at the 



Email address: rggibson@cs.ualberta.ca (Richard Gibson) 
URL: http://cs.ualberta.ca/~rggibson/ (Richard Gibson) 



Preprint submitted to Artificial Intelligence 



May 2, 2013 



present iteration, and the average profile that accumulates a running average 
of all previous profiles generated. In two-player zero-sum games, the average 
profile approaches a Nash equilibrium and is generally used in practice, while 
the current profile is discarded. CFR can also be applied to non-zero-sum 
games and games with more than two players, but the average profile does 
not necessarily approximate an equilibrium in such cases [2, Table 2}. Previ- 
ous work provides no theoretical insights into the average profile outside of 
two-player zero-sum games. 

Nonetheless, CFR has been applied successfully to games that are not 
two-player zero-sum. For example, CFR was used to generate more aggres- 
sive, or tilted, poker strategies from non-zero-sum games capable of defeating 
top poker professionals in two-player limit Texas Hold'em [3]. In addition, 
winning three-player Texas Hold'em poker strategies in the Annual Computer 
Poker Competition (ACPC) [4] have been constructed using CFR [2, 5]. As 
CFR's memory requirements are linear in the size of the game, a common ap- 
proach in poker is to employ a state-space abstraction that merges different 
card deals into buckets, leaving hands in the same bucket indistinguishable 
[6, 7]. Three-player limit Texas Hold'em contains over 10 17 decision states, 
and so many hands must be merged for CFR to be feasible. In 2011, the 
winning three-player agent combated this problem through heads-up expert 
strategies [2] that merged fewer hands and only acted in common two-player 
scenarios resulting from one player folding early in a hand. While CFR has 
been successful in these games, a reason why CFR might be successful in 
such domains has not been given. 

In this paper, we provide the first theoretical groundings for regret min- 
imization algorithms applied to games that are not two-player zero-sum. 
This is achieved by establishing elimination of iteratively dominated errors: 
mistakes where there exists an alternative that is guaranteed to do better, 
assuming the opponents do not make such errors themselves. The contri- 
butions of this paper are as follows. Firstly, we prove that in normal-form 
games, common regret minimization techniques eliminate (play with proba- 
bility zero) iteratively strictly dominated strategies. Secondly, we formally 
define dominated actions and prove that under certain conditions, CFR elim- 
inates iteratively strictly dominated actions and strategies. Thirdly, for two- 
player non-zero-sum games, we bound the average profile's worst-case per- 
formance, providing a theoretical understanding of tilted poker strategies. 
Fourthly, our theoretical results lead us to a simple modification of CFR for 
games with more than two players that only uses the current profile and does 
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not average. We demonstrate that with this change, CFR generates poker 
strategies that perform just as well as those generated without the change, 
but now require less time and less memory to compute. Furthermore, for 
large games requiring state-space abstraction, this reduction in memory al- 
lows finer-grained abstractions to be used by CFR, leading to even stronger 
strategies than previously possible. Fifthly, we develop a new three-player 
limit Texas Hold'em agent that, instead of using heads-up experts, varies 
its abstraction quality according to the estimated importance of each state. 
Our new agent wins the three-player events of the 2012 ACPC and defeats 
the previous years' champions, all while needing less computer memory to 
generate than the 2011 winner. 

The rest of this paper is organized as follows. Section 2 covers background 
material in game theory and solution concepts relevant to our work. Next, 
Section 3 discusses regret minimization and provides an overview of CFR in 
extensive-form games. We then formally define dominated actions in Section 
4 before proving our theoretical results in Section 5. Section 6 explores 
these theoretical findings and insights empirically across a number of different 
poker games. Our new champion three-player Texas Hold'em agent is then 
described and evaluated in Section 7. Finally, Section 8 concludes our work 
and discusses future research directions. Proof sketches are provided with 
the theorem statements, while full technical proofs are provided in Appendix 
A. 

2. Games 

2.1. Normal and Extensive Forms 

A finite normal-form game is a tuple G = (N, A, u) where N = 
{l,...,n} is the set of players, A = A\ x • • • x A n is the set of action 
profiles with Ai being the finite set of actions available to player i, and 
Ui : A — y R is the utility function that denotes the payoff for player % at 
each possible action profile. If n = 2 and = —u 2 , the game is two-player 
zero-sum (or simply zero-sum). Otherwise, the game is non-zero-sum. 
Two-player normal-form games are often represented by a matrix with rows 
denoting the row player's actions, columns denoting the column player's ac- 
tions, and entries indicating utilities resulting from the row player's and 
column player's actions respectively. A mixed strategy cr, for player i is a 
probability distribution over A^, where <7j(a) is the probability that action a 
is taken under U{. The set of all such strategies for player i is denoted £j. 
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Define the support of <7j, supp(<7j), to be the set of actions assigned positive 
probability by <7j. A strategy profile a E S is a collection of strategies 
a = (<7i, ...,<7 n ), one for each player. We let a_j refer to the strategies in cr 
excluding <jj, and ^(u) to be the expected utility for player % when players 
play according to a. 

Extensive-form games are often preferred to normal form when multiple 
decisions are made sequentially. Before providing the formal definitions, we 
describe Kuhn Poker, an extensive-form game that we will use as a running 
example throughout this paper. Kuhn Poker [8] is a zero-sum card game 
played with a three-card deck containing a Jack, Queen, and King. Each 
player antes one chip and is dealt one private card at random from the deck 
that no other player can see. There is a single round of betting starting with 
player 1, who may either check or bet one chip. If a bet is made, player 2 
can either fold and forfeit the hand, or call the one chip bet. When faced 
with a check, player 2 can either check or bet one chip, where a bet forces 
player 1 to either fold or call the bet. If neither player folds after the round 
of betting, then the player with the highest ranked card wins all of the chips 
played. 

In general, a finite extensive-form game with imperfect information 
[9] is a tuple T = (N, A, H, P, a c , u,I) that contains a game tree with nodes 
corresponding to histories of actions h E H and edges corresponding to 
actions a E A(h) available to player P(h) E N U {c} (where again N is the 
set of players and c denotes chance). For histories h, h! E H, we call h a 
prefix of history h', written h C h', if h' begins with the sequence h. When 
P(h) = c, cr c (h,a) is the (fixed) probability of chance generating action a at 

h. Terminal nodes correspond to terminal histories z E Z C H that have 
associated utilities Ui(z) for each player %. We define Aj = max 2jZ / 6 ^ Ui(z) — 
Ui(z') to be the range of utilities for player %. Non-terminal histories for player 

i, Hi, are partitioned into information sets I Eli representing the different 
game states that player i cannot distinguish between. For example, in Kuhn 
Poker, player i does not see the private card dealt to the opponent, and thus 
every pair of histories differing only in the private card of the opponent are 
in the same information set for player i. For each I E 2j, the action sets 
A(h) must be identical for all h E I, and we denote this set by A(I). Define 
|A(Xj)| = max/gX; |-<4(-0l t° be the maximum number of actions available to 
player % at any information set. We assume perfect recall that guarantees 
players always remember information that was revealed to them, the order 
it was revealed, and the actions they chose. 
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A behavioral strategy for player i, Oi G Sj, is a function that maps 
each information set / G Ij to a probability distribution over A(I). Denote 
ir a (h) as the probability of history h occurring if all players play according 
to a — (<7i, a n ). We can decompose 7i a (h) = Y\ieNu{c} 7r f(^) eacn 
player's and chance's contribution to this probability. Here, n((h) is the 
contribution to this probability from player i when playing according to <7j. 
Let K'Li{h) be the product of all contributions (including chance) except that 
of player i. In addition, let n a (h, h') be the probability of history h! occurring 
after h, given that h has occurred. Let Tr£(h,h') and ir ( L i {h,h') be defined 
similarly. Furthermore, we define the probability of player % reaching infor- 
mation set I G 2j as 7rf(7) = 7rf (/i) for any h E I. This is well-defined due 
to perfect recall as any two histories reaching the same information set must 
have followed the same sequence of actions at previous, identical information 
sets. 

A strategy s$ is pure if a single action is assigned probability 1 at every 
information set; for each I G X i; let Sj(7) be this action. Denote Si as the 
set of all pure strategies for player %. For a behavioral strategy <7j, define the 
support of <7j to be supp(<7j) = {sj G <S» | <Ji(I,Si(I)) > for all / G 2j}. 
Note that normal form is a generalization of extensive form. An extensive- 
form game T can be represented in normal form G by setting the action 
set in G for player % to be the set of all pure strategies in T and assigning 
utility Ui(s) = Yl Z £Z 7I ' s ( z ) u i( z )- Then, every behavioral strategy <7j in T has 
a utility-equivalent mixed strategy in G where the probability of selecting 
Si is Yliez a i(Ii s i(I)) [10]- However, normal form is often impractical for 
even moderately-sized problems because the size of the action set in G is 
exponential in |2j| • |A(2j)|. 

2.2. Solution Concepts 

In this paper, we consider the problem of computing a strategy profile 
to a game for play against a set of unknown opponents. The most common 
solution concept is the Nash equilibrium. For e > 0, a strategy profile a is 
an e-Nash equilibrium if no player can unilaterally deviate from a and 
gain more than e; i.e., max^^ Ui(o~i, o~-i) < Ui(a) + e for all % G N. A 
0-Nash equilibrium is simply called a Nash equilibrium. For games with 
more than two players, computing a Nash equilibrium is hard and belongs 
to the PPAD-complete class of problems [11-14]. Alternatively, we consider 
a superset of Nash equilibria, particularly those profiles that avoid iterative 
strict domination. 
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Definition 1. A strategy j vr player i is a strictly dominated strategy 

if there exists another player i strategy a[ such that Ui((Ti,cr-i) < Uiip'^O-i) 
for all O-i G £_$. 

Weak and very weak dominance have also been studied that allow equality 
instead of strict inequality for all but one and for all opponent profiles respec- 
tively For each type of dominance, an iteratively dominated strategy 
is any strategy that is either dominated or becomes dominated after suc- 
cessively removing iteratively dominated strategies from the game. In this 
paper, we focus on strict domination where it is well-known that iterated 
removal of strictly dominated strategies always results in the same set of 
remaining strategies, regardless of the order of removal [15]. 

Conitzer and Sandholm [16] prove that a strictly dominated strategy <7j e 
Ej in a normal-form game can be identified in time polynomial in \Aj\ = \Si\ 
by showing that the objective of the linear program 

minimize p Si (1) 

subject to Ws-i e S-i, ^2 Ps t Ui(si,S-i) > Ui((j h S-i) 

is less than 1, where each p s . is a nonnegative real number. Iteratively strictly 
dominated strategies can then be eliminated by repeatedly solving this pro- 
gram and removing the dominated pure strategies from Si and S-i. How- 
ever, this method is infeasible for large extensive-form games as the linear 
programs would require an exponential number of constraints in the size of 
the game. Hansen et al. [17] develop a dynamic programming algorithm for 
partially observable stochastic games, a generalization of normal-form games, 
that removes iteratively very weakly dominated strategies, but is not prac- 
tical beyond small toy problems. Further insights are provided by Waugh's 
domination value [18] that attempts to measure the amount of utility lost 
through playing iteratively dominated strategies in zero-sum games. Waugh 
demonstrates a strong correlation between the domination value of a strategy 
with performance in a small poker game, suggesting that removal of domi- 
nated strategies is enough for good play. This particular work motivates our 
results in Section 5. 

Two other generalizations of Nash equilibria, correlated and coarse cor- 
related equilibria, require a mechanism for correlation among the players. 
Suppose an independent moderator selects a profile a k from E = {a 1 , a K } 
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a b 



A f 1,0 0,0\ 
B 0,0 2,0 
C\-1,0 1,0/ 

Figure 1: A two-player non-zero-sum normal- form game, where the column player's utility 
is always zero. 

according to distribution q and privately recommends each player % play strat- 
egy a\. Then (E, q) is a correlated equilibrium if no player has an incen- 
tive to unilaterally deviate from any recommendation. A coarse correlated 
equilibrium is similar but even more general, where for all % G N, we only 
require that 

K K 

q{k)u t (a k ) > max ^ g(/cH(^, a^). (2) 
k=i a<eEi fc =i 

To not be in a coarse correlated equilibrium, a player would need incentive 
to deviate even before receiving a recommendation and the deviation must 
be independent of the recommendation. Without a mechanism for correla- 
tion, it is unclear how a practitioner should use a correlated equilibrium. 
In addition, while correlated equilibria remove dominated strategies [19], a 
coarse correlated equilibrium may lead to the recommendation of a strictly 
dominated strategy. For example, in the normal-form game in Figure 1, 
{(A, a) = 0.5, (B,b) = 0.25, (C, b) = 0.25} is a coarse correlated equilibrium 
with the row player's expected utility being 5/4, yet the strictly dominated 
row player strategy that always plays C is recommended 25% of the time. 

3. Regret Minimization 

Given a sequence of strategy profiles a 1 , a T , the (external) regret 
for player i is 

T 

Rf measures the amount of utility player i could have gained by following the 
single best fixed strategy in hindsight at all time steps t — 1, ...,T. Theorem 
1 below states a well-known result that relates regret to Nash equilibria in 
zero-sum games: 
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Theorem 1. In a zero-sum game, for e > 0, if Rj < e for i = 1,2, then the 
average strategy profile, a T (defined later), is a 2e-Nash equilibrium. 

A proof is provided by Waugh [18, p. 11]. It is also well-known that in 
any game, minimizing internal regret, a stronger notion of regret, leads to a 
correlated equilibrium, but we only consider external regret here. 

3.1. Regret Matching and CFR 

Regret matching [20] is a very simple, iterative procedure that mini- 
mizes average regret in a normal-form game. First, the initial profile a 1 is 
chosen arbitrarily. For each action a 6 Aj, we store the accumulated regret 
Rj{a) = Ylt=i ( M i( a ; °"-j) ~~ that measures how much player i 

would rather have played a at each time step t than follow a\. Successive 
strategies are then determined according to 

= ^' +{ t + , r , (3) 

where x + = max{i, 0} and actions are chosen arbitrarily when the denomi- 
nator is zero. One can show that 

£ =Bm *m < *>m. (4) 

T aeA t T ~ y/f 

A general proof is provided by Gordon [21], while a more direct proof is 
provided by Lanctot [22, Theorem 2]. By Theorem 1, the average strategy 
profile, defined by crj(a) = Ylt=i a t( a )/T, approaches a Nash equilibrium 
as T — > oo. 

Regret matching requires storage of Rf(a) for all a G Ai. Thus, it is in- 
feasible to directly apply regret matching to even moderately-sized extensive- 
form games due to the resulting exponential size of the action (pure strat- 
egy) space. Alternatively, Counterfactual Regret Minimization (CFR) 
[1] is a state-of-the-art algorithm that minimizes average regret while only 
requiring storage proportional to \Xj\ ■ \A(Xj)\ in the extensive-form game. 
Pseudocode is provided in Algorithm 1. On each iteration t and for each 
player i, the expected utility for player i is computed at each information set 
/ G Xj under the current profile a 1 , assuming player % plays to reach /. This 
expectation is the counterfactual value for player %, 
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Algorithm 1 Counterf actual Regret Minimization (Zinkevich et al. 2008) 
1: Initialize regret: V7, a G A(I) : R(I,a) <- 
2: Initialize cumulative profile: V/, a G ^4(7) : s(7, a) <— 
3: Initialize current profile: V7,a G -A(7) : a (I, a) = 1/\A(I)\ 
4: for * G {1,2,...,T} do 



5: for i G iV do 
6: for 7 G 2j do 

7: <Ti(7, •) «- RegretMatching(i?(7, •)) 

8: for a G A(I) do 

9: i?(7, a) <- R(I, a) + ^(7, a {I ^ a) ) - v^I, a) 

10: s(7,a) <r- s(I,a) +nf(I)<Ti(l,a) 

11: end for 

12: end for 

13: end for 



14: end for 



where Zj is the set of terminal histories passing through 7 and z[I\ is the 
history leading to z contained in 7. For each action a G A(I), these values 
determine the counterfactual regret at iteration t, r*(7, a) = Vi(I, <?■(/_>.„)) — 
Vi(I, (7*), where <7(j_». ) is the profile a except at 7, action a is always taken. 
The regret r*(7, a) measures how much player i would rather play action 
a at 7 than follow a\ at 7. These regrets are accumulated to obtain the 
cumulative counterfactual regret, Rj(I,a) = J2t=i r i(^ a )y that define 
the current strategy profile via regret matching at 7, 

af+\l,a) = R ^ +{1 ^ • (5) 

YibeA(i) R %' ( J ' & ) 

This procedure minimizes average regret according to the bound 

*r A, n ym)\ p Theorem 4] (6) 

During computation, CFR stores a cumulative profile sf(I, a) = 
Y^t=i 7r f(I) a ' t i(Ii a )- Once CFR is terminated after T iterations, the out- 
put is the average strategy profile af(I,a) = s[ (I , a) / J2beA(i) s f(^>&)- 
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Since all players are minimizing average regret, it follows by Theorem 1 
that for zero-sum games, CFR's average profile converges to a Nash equi- 
librium. For non-zero-sum games, if we assign probability 1/T to each of 
the profiles {a 1 , a T } generated by CFR or any other regret minimizer, 
then by equation (2) and minimization of regret, this converges to a coarse 
correlated equilibrium. Though previous work omits this fact, it is unclear 
how this could be useful, let alone why the average strategy of might be 
valuable. However, the average strategy has been shown to perform well em- 
pirically in non-zero-sum games against human opponents and competitors 
in the ACPC [2, 3, 5]. One of our aims in this paper is to help explain why 
CFR is performing well in non-zero-sum games. 

3.2. Other Regret Minimization Concepts and Techniques 

There are two other solution concepts associated with the notion of regret 
minimization. Both concepts define the regret of a strategy U{ to be 

regretiipi) = max Uj(cr-, cr_j) - ufai, cr_j). 

Firstly, Renou and Schlag [23] define a* G £ as a minimax regret equilibrium 
relative to £ if 

regreti(a*) < regret{{pi) for all <7j G £« and all i G N. 

This turns out to be an even stronger condition than Nash equilibrium, which 
is already hard to compute in games with more than two players. The authors 
also define the e-minimax regret equilibrium variant where with probability 
1 — e the opponents are assumed to play according to the equilibrium, and 
with probability e no assumption is made. Here, the common assumption 
of rationality is dropped and thus e-minimax regret equilibria can end up 
playing iteratively strictly dominated strategies [23, p. 276]. 

Secondly, Halpern and Pass [24] introduce iterated regret minimization. 
Much like iterated removal of dominated strategies, the authors iteratively 
remove all strategies <7j that do not provide minimal regreti(o~i). They show 
that while the set of non-iteratively strictly dominated strategies can be 
disjoint from those that survive iterated regret minimization, their solutions 
match closely to those solutions played by real people in a number of small 
games. Our work here is less concerned with understanding how humans 
arrive at solutions and more concerned with understanding and advancing 
CFR in developing state-of-the-art game-playing agents. 
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4. Dominated Actions 

Our contributions in this paper begin with a formal definition of domi- 
nated actions that are specific to extensive-form games, and we relate such 
actions to dominated strategies. Dominated actions are considered in the 
Gambit Software Tools package and are loosely defined as actions that are 
"always worse than another, regardless of the beliefs at the information set" 
[25]. Here, we say an action a at / G X, is a strictly dominated action if 
there exists a strategy a[ that guarantees higher counterfactual value at / to 
any other strategy o~i that always plays a at /, regardless of what the oppo- 
nents play but assuming they reach / with positive probability. The formal 
definition is below. 

Definition 2. An action a G A(I) of an extensive-form game is a strictly 
dominated action if there exists a strategy o~\ G Ej such that for all profiles 
a G E satisfying Y.hei n -i( h ) > °> we have v i{^ a {i^a)) < Vi(I, (cr-,^)). 

We use the counterfactual value V{ instead of Ui in Definition 2 because we 
are only concerned with the utility to player i from I onwards rather than 
over the entire game. Similar to iteratively dominated strategies, we also 
define an iteratively strictly dominated action as one that is either 
strictly dominated or becomes strictly dominated after successively remov- 
ing strictly dominated actions from the players' action sets. Analogous to 
strategic dominance in Definition 1, weak and very weak action dominance 
allow equality rather than strict inequality for all but one profile a and for 
all profiles respectively. In addition, weak and very weak action dominance 
do not require the condition that J2hei 7r -i(^) > 0- 

For example, consider again Kuhn Poker defined in Section 2. When 
player 2 is faced with a bet from player 1, calling the bet when holding the 
Jack is a strictly dominated action. This is because the Jack is the worst card 
and thus never wins regardless of player l's private card. Similarly, folding 
with the King is a strictly dominated action. Note that a strategy that plays 
either of these actions with positive probability is not necessarily a strictly 
dominated strategy (but is a weakly dominated strategy, as Hoehn et al. [26] 
conclude) because there exist player 1 strategies that never bet. In addition, 
once these two actions are removed, one can check that player l's action of 
betting with the Queen is iteratively strictly dominated. Since player 2 now 
only folds with the Jack and only calls with the King, it is strictly better for 
player 1 to always check with the Queen and then call a player 2 bet with 
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probability 2/3. Thus, iteratively strictly dominated actions can identify 
errors that iteratively strictly dominated strategies cannot. 

Proposition 1 below states a fundamental relationship between dominated 
actions and strategies. Any strategy that plays to reach information set I 
(7rf(J) > 0) and plays a weakly dominated action a at / (o"j(/,a) > 0) 
is a weakly dominated strategy. Since strictly dominated actions are also 
weakly dominated, it follows from Proposition 1 that any strategy that plays 
a strictly dominated action is a weakly dominated strategy. We provide 
a proof sketch of the proposition below, while full proofs can be found in 
Appendix A. 

Proposition 1. If a is a weakly dominated action at I G 2j and o-; L G £j 
satisfies ir? (I)o~i(I, a) > 0, then <jj is a weakly dominated strategy. 

Proof Sketch. By definition of action dominance, there exists a strategy 
o\ G Sj such that fj(J, (T(/^ a )) < v (o^, <r_j) for all opponent profiles a_j G 
S_j. One can then construct a strategy a" that follows <7j everywhere except 
within the subtree rooted at /, where instead we follow a mixture of o~i and 
a\. The weight in this mixture assigned to a[ is (1 — o"j(/,a)) > 0. The 
strategy a-i is then weakly dominated by of. □ 

It is possible, however, for a dominated strategy to not play any domi- 
nated actions. For example, consider the zero-sum extensive-form game in 
Figure 2 where both players take two private actions. The pure strategy for 
player 1 of playing b and then e is strictly dominated by the pure strategy 
that plays a and then e because the latter strategy guarantees exactly 1 more 
utility than the former, regardless of how player 2 plays. Similarly, the pure 
strategy that plays a and then / is strictly dominated by the pure strategy 
that plays b and then /. However, no action is even weakly dominated. For 
instance, after playing a (or b), the utility player 1 receives for playing e can 
be greater, equal to, or less than the utility for playing / depending on how 
player 2 plays. 

5. Theoretical Analysis 

Clearly, one should never play a strictly dominated action or strategy as 
there always exists a better alternative. Furthermore, if we make the com- 
mon assumption that our opponents are rational and do not play strictly 
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Figure 2: A zero-sum extensive-form game with strictly dominated strategies, but no 
strictly or weakly dominated actions. Nodes connected by a dashed line are in the same 
information set. Terminal values indicate utilities for player 1. 

dominated actions or strategies themselves, then we should never play it- 
eratively strictly dominated actions or strategies. In zero-sum games, CFR 
converges to a Nash equilibrium, and so the average profile is guaranteed to 
eliminate strictly dominated strategies. For non-zero-sum games, however, 
Abou Risk and Szafron [2] demonstrate that CFR may not converge to a 
Nash equilibrium. In this section, we provide theoretical evidence that CFR 
does eliminate (i.e., play with probability zero) strictly dominated actions 
and strategies. 

We begin by showing that in normal-form games, a class of regret mini- 
mization algorithms, including regret matching, all remove iteratively strictly 
dominated strategies. This is a simple result that, to our knowledge, was pre- 
viously unknown. Recall that the support of a strategy <7j, supp(<7j), is the 
set of actions assigned positive probability by <7j. 

Theorem 2. Let a 1 , a 2 , ... be a sequence of strategy profiles in a normal-form 
game where all players ' strategies are computed by regret minimization algo- 
rithms where for all i E N , a E A i} if Rj(a) < and Rf(a) < max^A* Rj{b), 
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then aj +1 (a) =0. If ' Oi is an iteratively strictly dominated strategy, then there 
exists an integer T such that for all T > T , supp(<7i) ^ supp(<rf ). 

Proof Sketch. For the non-iterative dominance case, by strict domination 
of <7j, there exists another strategy a[ G £, such that 

e= min u^a-, a_j) - Ui(a h a_j) > 0. 

a_j6A_j 

One can then show that there exists an action a G supp(<7j) such that 
Rj(a) < -eT + max Rj(b) < -eT + R[' + . 

Since i?j ' + /T — >• as T — >• oo, it follows that Rf(a) < after some finite 
number of iterations T . By our assumption, this implies a supp(af ) for 
all T > T as desired. Using the fact that new iterative dominances only 
arise from removing actions and never from removing mixed strategies [16], 
iterative dominance is proven by induction on the finite number of iteratively 
dominated pure strategies that must first be removed to exhibit domination 

Of <7j. □ 

Note that regret matching is a regret minimization algorithm that satisfies 
the conditions required by Theorem 2, as long as when the denominator of 
equation (3) is zero, we choose af +1 (a) = when Rj(a) < max fc6 ^. Rf (b). 
Also, if a pure strategy Sj(a) = 1 is iteratively strictly dominated, then 
Theorem 2 implies that af never plays action a after a finite number of 
iterations. 

We now turn our attention to extensive-form games, which are our pri- 
mary concern. Here, the linear program (1) cannot be applied to find non- 
iteratively strictly dominated strategies in even moderately-sized extensive- 
form games as the programs would require a number of constraints exponen- 
tial in the size of the game. On the other hand, we can apply CFR. 

First, we consider the removal of iteratively strictly dominated actions. 
Our results rely on two conditions. Let x T be the number of iterations t 
where XLeA(/) a ) = f° r some i £ N and I e T i: 1 < t < T. The 

first condition we require is that x T be sublinear in T. Intuitively, this is 
necessary because otherwise, the denominator of equation (5) is zero too 
often, and so regret matching too often yields an arbitrary strategy at some 
/ G Xj that potentially plays a dominated action. While we cannot prove that 
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this condition always holds, we show empirically that x T /T decreases over 
time in the next section. Next, for I G Xj and 5 > 0, define = {er G 

^ I She/ > 5} to be the set of profiles where the probability that the 

opponents play to reach /, ^2 heI 7r^(/i), is at least 5. The second condition 
we require is that the opponents reach each information set / containing 
a dominated action often enough, meaning that there exist real numbers 
5,7 > and an integer V such that for all T > T, |£ 5 (7) n {V | T < 
t < T}\ > 7T. This condition appears necessary because the magnitude of 
the counterfactual regret \rj(I,a)\ = \vi(I, C(/_> a )) — fj(cr*)| < ^iJ2hei n -i(^) 
is weighted by the probability of the opponents reaching /. Thus, if the 
opponents reach I with probability zero, then we will stop learning how to 
adjust our strategy. Since it could take several iterations to eliminate an 
iteratively strictly dominated action, we may end up stuck playing such an 
action when / is not reached by the opponents often enough. 

Theorem 3. Let a 1 , a 2 ,... be strategy profiles generated by CFR in an 
extensive-form game, let I G X i} and let a be an iteratively strictly domi- 
nated action at I , where removal in sequence of the iteratively strictly dom- 
inated actions a±, ...,dk at ii, respectively yields iterative dominance of 
ak+i = a. If for 1 < £ < k + 1, there exist real numbers de^e > and an 
integer T e such that for all T > Tg, \Hg e (Ig) n {a* | T e < t < T}\ > ^T, then 

(i) there exists an integer T such that for all T > T , Rf(I, a) < 0, 

(ii) if lim;r-s.oo x T /T = 0, then \imT^ooy T (I,a)/T = 0, where y T (I,a) is 
the number of iterations 1 < t < T satisfying a l {I,a) > 0, and 

(iii) if liniT-s.oo x T /T = 0, then lim^oo nf T (I)af(I, a) = 0. 

Proof Sketch. Similar to the proof of Theorem 2, there exists an e > 
and a term F such that 



Again, this implies that there exists an integer T Q such that for all T > T , 
Rj(I,a) < 0, establishing part (i). Since CFR applies regret match- 
ing at /, part (i) and equation (3) imply that for all T > Tq, either 
EbeAm R I' + (^ b) = 0or crj +l (l, a) = 0. From this, we have 



Rj (I, a) < -eiT + F where lim 



= 0. 



lim 

T-s-oo 



< lim 



y 



> T °{I,a)+x T 



o, 



T 



T-s.oo 



T 
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proving part (ii). Finally, part (iii) follows according to 

lim K T (I)a?(I,a) = lim ^LlI^MlA < lim fllA = , 

where the first equality is by the definition of the average strategy and the 
inequality is by definition of y T (I, a). □ 

Part (iii) of Theorem 3 says that an iteratively strictly dominated action 
is not reached or is removed from the average profile a T in the limit, whereas 
part (i) suggests that iteratively strictly dominated actions are removed from 
the current profile a T after just a finite number of iterations (except possibly 
when ^2 ae A(i) a ) = 0)- Finally, part (ii) states that the number of 

current profiles that play an iteratively strictly dominated action a at /, 
y T (I, a), is sublinear in T. 

Next, we show that the profiles generated by CFR eliminate all iteratively 
strictly dominated strategies, assuming again that x T /T —¥ 0. 

Theorem 4. Let a 1 , a 2 ,... be strategy profiles generated by CFR in an 
extensive-form game, and let Oi be an iteratively strictly dominated strategy. 
Then, 

(i) there exists an integer T such that for all T > T , there exist I G Ii, 
a G A(I) such that 7rf (J)<7j(J, a) > and Rf(I,a) < 0, and 

(ii) if liniT-s>oo x T /T = 0, then lim^oo y T (o~i)/T = 0, where y T (o"i) is the 
number of iterations 1 < t < T satisfying supp(<Ji) C supp(a\). 

Proof Sketch. For <jj g Ej, define 

T 

RJmMi) = 5^(«i(<7i,<7-i) - «i(<7*))- 
t=l 

Similar to the proof of Theorems 2 and 3, there exists an e > and a term 
F' such that 

Rj M M) < ~eT + F' where lim ^ = 0. (7) 
Next, one can show that 

^fun(^) = E <( J ) E a ) R i(^ °)- ( 8 ) 

ieii aeA(i) 
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Figure 3: A two-player non-zero-sum extensive-form game where each player has a single 
information set. 



Since nf (I), <Ji(I, a) > 0, it follows by equations (7) and (8) that after a finite 
number of iterations T , there exist / G Ii, a G A(I) such that 7rf (I)ai(I, a) > 
and -Rf (/, a) < 0, establishing part (i). Part (ii) then follows as in the proof 
of part (ii) of Theorem 3. □ 

Similar to part (i) of Theorem 3, part (i) of Theorem 4 says that after 
a finite number of iterations, there is always some information set / that 
the dominated strategy a,i plays to reach and some action at / played by ai 
which aj does not play (except possibly when ^aeA(j) -^f' + (^> a ) = 0)' anc l 
so of 7^ (Tj. Part (ii) similarly states that the number of profiles generated 
whose support contains <7j, y T (<Ji), is sublinear in T. Notice that Theorems 
2 and 4 do not draw any conclusions upon the average profile o T . Perhaps 
surprisingly, it is possible to have a sequence of profiles with no regret where 
the average profile converges to a strictly dominated strategy. Consider the 
two-player non-zero-sum game in Figure 3. The sequence of pure strategy 
profiles (A, a), (B, b), (A, a), (B, b), ... has no positive regret for either player, 
and in the limit, the average profile for player 1, a\ , plays A and B each 
with probability 0.5. However, a\ is strictly dominated by the pure strategy 
that always plays C . 

Our final theoretical contribution shows that in two-player non-zero-sum 
games, regret minimization yields a bound on the average strategy profile's 
distance from being a Nash equilibrium. 

Theorem 5. Let e, 5 > and let cr 1 ,^ 2 , ...,a T be strategy profiles in a two- 
player game. If Rf /T < e for i = 1,2, and \u\ + u 2 \ < S, then a T is a 
2(e + 5) -Nash equilibrium. 
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Proof. We generalize the proof of [18, p. 11]. For i — 1,2, by the definition 
of regret, we have 

1 T 

e > - max ^ (u^, trlj - Ui(<7*)) 

1 ^ 

by linearity of expectation. Summing the two inequalities for i — 1, 2 gives 

2e > max Wi(cr / 1 , <tJ) + max u 2 {a1 , cr^) - — V, (""il "') + ^(c*)) 

> max Ui(a' 1: a 2 ) + max (— U\{a\ , cr^) — 5) — 5 
crJeSi o- 2 es 2 

= max ui(a[, erj) — min U\{d^ , cr^) — 25 
(T^eEi " <r 2 eE 2 

> max (cri, <rj) — Wi(o- T ) — 2(5, 

where the last line follows by setting a' 2 = o\. Rearranging terms gives 
max ui(a' 1: crj) < uAa 7 ) + 2(e + <5). 

Applying the same arguments but reversing the roles of the two players gives 

max u 2 (a'[, a' 2 ) < u 2 (a T ) + 2(e + 5), 
o- 2 es 2 

and thus by definition a T is a 2(e + 5)-Nash equilibrium. □ 

Theorem 5 is a generalization of Theorem 1. When 5 = 0, the game is 
zero-sum, and so the average profile converges to equilibrium as e — > 0. In 
addition, when the players' utilities sum to at most 5 > 0, then as e — >• 0, 
the average profile converges to a 25-Nash equilibrium. 

5.1. Remarks 

Theorems 2, 3, and 4 provide evidence that regret minimization removes 
iterative strict domination. Of course, eliminating strict domination may 
not provide any useful insights in games where few strategies are iteratively 
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strictly dominated. Despite this obvious limitation, Theorems 3 and 4 pro- 
vide a better understanding of the strategies generated by CFR in non-zero- 
sum games than what coarse correlated equilibria provide. In the next sec- 
tion, we show that avoiding iteratively dominated actions is enough to per- 
form well in Kuhn Poker. However, large games such as three-player Texas 
Hold'em are too complex to analyze action and strategic dominance beyond 
obvious errors, such as folding the best hand. It remains open as to how well 
our theory explains the success of CFR in these large games. 

Perhaps more importantly, the theory developed here has guided us to a 
more efficient adaptation of CFR, in both time and memory, for games with 
more than two players. Given Theorems 3 and 4 and given we have only finite 
time, we suggest using the current profile in practice rather than the average. 
In fact, while Theorem 5 says that the average profile converges to a 25-Nash 
equilibrium in two-player games, there is no clear case for preferring the 
average over the current profile in three-or-more-player games. Furthermore, 
the average profile is not used in any computations during CFR, so when 
discarding the average, there is no reason to store the cumulative profile. 
This reduces the memory requirements of CFR by a factor of two, since then 
only one value per information set, action pair (Rf(I,a)) must be stored 
as opposed to two. Not only does this allow us to tackle larger games, the 
extra memory might be utilized to compute even stronger strategies than 
previously possible. 

We are not the first to consider using the current profile. In CFR-BR, a 
recently developed CFR variant for zero-sum games that replaces one player 
with a worst-case opponent, the current profile converges to equilibrium with 
high probability [27, Theorem 4]. The authors discuss similar benefits when 
discarding the cumulative profile in CFR-BR and just using the current strat- 
egy profile. Nonetheless, we are the first to suggest using the current profile 
both with the original CFR algorithm and in games with more than two 
players. The next section explores these new insights. 

6. Empirical Study 

Using poker as a testbed, we design several experiments to test our theory 
developed in the previous section. While previous work has applied CFR 
across several domains [22], poker games are of particular interest as they 
are widely popular and many computer agents from past ACPC events are 
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available to test against. New games can also be easily created by adjusting 
the number of players, cards, and betting rounds that take place. 

6.1. Poker Games 

We consider three different poker games for our experiments in this sec- 
tion. The first is Kuhn Poker, which was introduced in Section 2. 

Our second game and our main game of interest is three-player limit Texas 
Hold'em. To begin the game, the player to the left of the dealer posts a small 
blind of five chips, the next player posts a big blind of ten chips, and each 
player is dealt two private cards from a standard 52-card deck. Texas Hold'em 
consists of four betting rounds with three, one, and one public card(s) being 
revealed before the second, third, and fourth rounds respectively. All bets 
and raises are fixed to ten chips in the first two rounds and twenty chips in 
the last two rounds; players may not go all-in as in no-limit poker. There 
is also a maximum of four bets or raises allowed per round. At the end of 
the fourth round, the players that did not fold reveal their hand. The player 
with the highest ranked poker hand made up of any combination of their two 
private cards and five public cards wins all the chips played. 

With three players, limit Texas Hold'em contains approximately 5 x 10 17 
information sets and CFR would require hundreds of petabytes of RAM to 
minimize regret in such a large game. Instead, a common approach is to use 
state-space abstraction to produce a similar game of a tractable size by merg- 
ing information sets or restricting the action space [6, 7]. For Texas Hold'em, 
we merge card deals into buckets so that hands falling into the same bucket 
are indistinguishable. We can then control the size of the abstract game by 
increasing or decreasing the number of buckets used on each round. However, 
increasing abstraction size not only increases memory requirements, but also 
increases the number of iterations required to minimize average regret (see 
equation (6)). There are just three actions (fold, check/call, and bet/raise) 
available in limit Hold'em, and thus we do not abstract on actions. Note 
that applying CFR to an abstraction of Texas Hold'em yields no guarantees 
about regret minimization or domination avoidance in the real game (but are 
guaranteed in the abstract game). Furthermore, we will use imperfect recall 
abstractions that forget the buckets from previous rounds and break our as- 
sumption of perfect recall stated in Section 2. Despite these complications, 
abstraction and imperfect recall still appear to work well in practice [3, 28]. 

Thirdly, we also consider the game of 2-1 Hold'em [29] that is identical 
to Texas Hold'em, except consists of only the first two betting rounds and 
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Tabic 1: Results of a six-agent mock tournament of Kuhn poker. Reported scores for the 
row strategy against the column strategy are in expected milli-chips per game, averaged 
over both player orderings. 
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only one raise is allowed per round. Two-player 2-1 Hold'em has roughly 
16 million information sets, which is small enough to apply CFR without 
abstraction. Furthermore, because full tree traversals in CFR are very ex- 
pensive, we instead use sampling variants that only traverse a smaller subset 
of information sets on each iteration. We found that the most efficient vari- 
ant for 2-1 Hold'em was Public Chance Sampling [29] and for three-player 
limit Texas Hold'em was External Sampling [30]. 

6.2. Dominated Actions and Performance in Kuhn Poker 

To begin, we investigate the correlation between the presence of iteratively 
dominated actions in one's strategy with the performance of the strategy in 
a mock ACPC-style tournament. In the ACPC, each game is evaluated ac- 
cording to two different scoring metrics. The total bankroll (TBR) metric 
simply ranks competitors according to their overall earnings in money per 
game averaged across all possible opponents. The instant runoff (IRO) met- 
ric, however, ranks competitors by iteratively eliminating the lowest scoring 
agent from consideration and reevaluating the overall scores by averaging 
only across the remaining agents. In a zero-sum game, a Nash equilibrium 
strategy is optimal for winning IRO since it never loses in expectation to any 
opponent. 

We ran a six-agent mock tournament of Kuhn poker, which was intro- 
duced in Section 2. Kuhn poker is a small enough game where we can easily 
identify all iteratively dominated actions and all Nash equilibrium strategies 
have already been classified [8]. Our agents consist of a uniform random 
strategy (Uni), a strategy that plays no dominated actions (does not call 
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(a) Orange tilt (b) Green tilt 



Figure 4: Log-log plots measuring the distance from equilibrium of CFR strategies in 
tu%-tilted 2-1 Hold'em over iterations. Distance is measured in milli-big-blinds per game 
(mbb/g). 



with the Jack or fold with the King) but is otherwise uniform random (ND), 
a strategy that plays no iteratively dominated actions (no dominated actions 
and does not bet with the Queen) but is otherwise uniform random (NID), 
and three Nash equilibrium strategies (NE-7) for 7 = 0, 0.5, 1, where 7 is 
the probability of betting with the King. A cross table of the results for 
each pair of strategies is given in Table 1. For IRO, after successively elim- 
inating Uni and then ND, there is a four- way tie for first place between the 
three equilibrium strategies and NID. In addition, NID happens to win TBR, 
though none of the strategies are designed with TBR in mind. This mock 
tournament provides one example where high performance can be achieved 
by simply avoiding iteratively dominated errors. 

6.3. Distance from Equilibrium in Two-Player Non-Zero-Sum Games 

Our next experiment applies CFR to non-zero-sum tilted variants of two- 
player 2-1 Hold'em. Tilted games are constructed by rewarding or penalizing 
players depending on the outcome of the game. This can lead to more ag- 
gressive play when applied to the regular, non-tilted game and were used by 
the poker program Polaris that won the 2008 Man-vs-Machine competition 
[3]. Here, we use the orange tilt that gives the winning player an extra w% 
bonus, and the green tilt that both reduces the losing player's loss in a show- 
down (i.e., when neither player folded) by w% and penalizes the winning 
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player by w% when the losing player folded. In both of these games, we 
can bound \ui + u 2 \ < Ajiu/100, and so Theorem 5 states that CFR will 
converge to at least a Ajw/50-Nash equilibrium. For w G {0,7, 14,35}, we 
ran CFR and measured how far the average profile was from equilibrium in 
the w%-tilted game by calculating max (T . eSi Mi(<7i, cr^) and averaging over 
both players % — 1,2. In addition, we also measured the same value for the 
current profile in the non-tilted game (w — 0). These results are shown in 
Figure 4. As expected, in the non-tilted game (w — 0), the average profile 
is approaching a Nash equilibrium. For the tilted games, we see that as w is 
increased, most of the profiles are further from equilibrium, coinciding with 
Theorem 5. However, the strategies are much closer to equilibrium than the 
distance guaranteed by Theorem 5 (note that A, = 8 big blinds) and only 
in the green tilt with w = 35 is it obvious that CFR is not converging to an 
exact equilibrium. Of course, Theorem 5 only provides an upper bound on 
the average profile's distance from equilibrium, and this bound appears to be 
quite loose. These results warrant further investigation into regret minimiza- 
tion in two-player non-zero-sum games. Finally, it is clear that the current 
strategy profile with w = is not converging to equilibrium. Thus, unlike 
CFR-BR [27], the average profile from CFR is generally preferred to the 
current profile in two-player games as it gives a better worst-case guarantee. 

6.4- Positive Regret and Current Profile in Three- Player Limit Hold 'em 

Next, we examine how often J2 a eA(i) -^f' + (^> a) = as required by parts 
of Theorems 3 and 4. CFR was applied to two different abstractions of 
three-player limit Texas Hold'em. The first, labeled IX, consists of 169, 900, 
100, and 25 buckets per betting round respectively. This abstraction size was 
used by the winning agents of the 2010 ACPC 3-player events [5] and contains 
about 262 million information sets. The second abstraction, labeled 2X, uses 
169, 1800, 200, and 50 buckets per betting round respectively, resulting in an 
abstract game approximately twice the size. All of our abstractions were built 
off-line using a /c-means clustering algorithm on hand strength distribution 
described by Johanson et al. [7]. For each abstraction, we measured £ T , the 
total number of times where External Sampling traversed an information 
set that had no positive regret at any action. The average of £ T is plotted 
over iterations T in Figure 5a. In both cases, we see that encountering an 
information set with no positive regret becomes less frequent over time, where 
we eventually encounter fewer than one such information set per iteration on 
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Figure 5: (a) Log-log plot measuring the frequency at which an information set is visited 
where every action has nonpositive cumulative counterfactual regret during CFR in the IX 
and 2X abstractions of three-player limit Texas Hold'em. (b) Performance over iterations 
(log scale) of three strategy profiles in a four-agent round-robin competition of three-player 
limit Texas Hold'em, measured in milli-big-blinds per game. Current-2X is the current 
profile generated by CFR in the 2X abstraction that is twice as large as the IX abstraction 
used to generate Average-lX and Current-lX. Error bars indicate 95% confidence intervals 
over 50 competitions. 



average. While we cannot guarantee that x T /T w £ T /T — » as required by 
Theorems 3 and 4, we at least have evidence that having no positive regret 
becomes a rare event. By part (i) of Theorems 3 and 4, this means that 
iteratively strictly dominated actions and strategies will likely be avoided in 
the current strategy profile. 

Using these same abstractions of three-player Hold'em, we now show that 
the current profile can reach higher performance faster than the average 
profile, and that the extra savings in memory acquired by discarding the 
average profile can be utilized to generate even stronger strategies. In this 
experiment, we generated three different strategy profiles with CFR, saving 
the profiles at various iteration counts. For the IX abstraction, we kept both 
the average and the current profile, while for the 2X abstraction, we kept just 
the current profile. Note that running CFR on the 2X abstract game without 
keeping the average profile requires no more RAM than running CFR on the 
IX abstraction and keeping both profiles. For each of our saved profiles, we 
then played a four-agent round-robin competition (RRC) against the base 
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strategy profiles 1 from the top 2009, 2010, and 2011 ACPC three-player 
entries. Figure 5b shows the amount won by each of our three strategies over 
iterations, averaged over 50 RRCs consisting of 10,000 games per match. 
Clearly, the IX current profile reaches strong play much sooner than the 
average profile, which requires about ten times the number of iterations to 
reach the same level of performance. Furthermore, while more iterations are 
needed in the 2X abstraction as expected by equation (6), we see that 2X 
eventually yields a current profile that outperforms both profiles in the IX 
abstraction. 

7. A New Champion Three-Player Limit Texas Hold'em Agent 

Finally, this section presents a new three-player limit Texas Hold'em agent 
that won the three-player events of the 2012 ACPC. Before presenting this 
new agent in detail, we summarize the previous competition winners. 

7.1. Previous ACPC Winners 

As we discussed in Section 6.1, abstraction is necessary in order to feasibly 
apply CFR to Texas Hold'em. Despite the loss of theoretical guarantees and 
the existence of abstraction pathologies [31], we generally see increased per- 
formance as we increase the granularity of our abstractions; in other words, 
more buckets are typically better [3, 7]. Abstraction granularity, however, 
is restricted by computational resources as CFR requires space linear in the 
size of the abstract game. 

One approach to improving abstraction granularity is to partition a game 
into smaller pieces and run CFR on each piece, either independently [2, 5, 32] 
or concurrently [5]. Strategies for each piece are referred to as experts that 
during a match, only act when play reaches their piece of the game. The 
winner of the 2011 ACPC three-player instant runoff (IRO) event was an 
agent built with such expert strategies. Similar to Abou Risk and Szafron's 
heads-up experts, the 2011 experts only acted in what appear to be the four 
most common two-player scenarios that resulted after one player had folded 
[2, Table 4]. In particular, an expert only acted after the opening sequence 
of player actions was /, rf, rrf, or rcf, where / denotes the fold action, 



1 Thc 2010 and 2011 agents employed special experts in some two-player scenarios that 
were not used in this specific experiment. More details regarding these agents are provided 
later in Section 7. 
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c denotes call, and r denotes raise. Two-player scenarios are convenient to 
work with since the elimination of one player greatly reduces the number 
of possible future action sequences and thus reduces the size of the game. 
These experts were computed independently using an abstraction with 169, 
180,000, 540,000, and 78,480 buckets on each of the four betting rounds 
respectively. Here and throughout this section, the same /c-means clustering 
technique on hand strength distribution from Section 6.4 was used to bucket 
hands and we refer the reader to Johanson et al. [7] for more details. To play 
the rest of the game, a base strategy for the full, unpartitioned three-player 
game was computed with CFR using an abstraction with 169, 10,000, 5450, 
and 500 buckets per round respectively. Thus, the experts could distinguish 
between many more different hands compared to the base strategy, even 
though the abstract game for the base strategy still contained approximately 
5.9 billion information sets. More details about this expert construction 
process are found in the description of the 2010 ACPC three-player IRO 
winner [5] that was identical to the 2011 agent but used coarser abstractions. 
In 2009, the first year of the three-player events, the IRO winner was a simple 
base strategy computed with CFR in a very coarse abstraction [2] . 

7.2. A New Three-Player Limit Texas Hold 7 em Agent 

As demonstrated above, partitioning a game into smaller pieces is a conve- 
nient method for increasing abstraction granularity. For the 2012 ACPC, we 
again used this same methodology to construct our new three-player limit 
Texas Hold'em agent. This time, rather than partitioning the game into 
special two-player scenarios, we partitioned the histories into two parts: an 
important part and an unimportant part. The important histories were de- 
fined as follows. First, we scanned all of the 2011 ACPC match logs that the 
winning IRO agent presented above played in and for each betting sequence, 
we calculated the frequency at which the agent was faced with a decision 
at that sequence. For example, the frequency the agent was faced with a 
decision at the empty betting sequence was 1/3 since positions in the game 
rotate, making the agent first to act once in every three hands. Next, we 
multiplied each of these frequencies by the pot size at that betting sequence. 
For instance, we multiplied the 1/3 frequency for the empty betting sequence 
by 15 since the game is played with a small blind of 5 chips and a big blind of 
10 chips, creating an initial pot of 15 chips. For each history, if this value for 
the history's betting sequence was greater than 1/100, then the history was 
labeled as important. Since (1/3) ■ 15 = 5 > 1/100, the empty sequence was 
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labeled as important. In addition, any prefix of an important history was 
also labeled as important, while the remaining histories were labeled as unim- 
portant. Only 0.023% of the nonterminal betting sequences in three-player 
limit Hold'em belonged to the important part. While many of the important 
histories overlapped with the two-player scenarios used by the 2011 agent, 
there were several three-player scenarios, such as the empty sequence and 
the rcc sequence, that were labeled important. 

Using this partition, we employed a very fine-grained abstraction on the 
important part and a coarse abstraction on the unimportant part. This way, 
our agent can distinguish between many more hands at the few sequences that 
historically were reached more frequently or that had lots of chips at stake. 
Our coarse abstraction for the unimportant part used the same 169, 1800, 
200, and 50 buckets per round employed by our 2X abstraction in Section 6.4, 
while our fine-grained abstraction for the important part used 169, 180,000, 
765,000, and 840,000 buckets per round respectively. Strategies for both 
parts were computed concurrently [5] across the 2.5 billion information set 
abstract game resulting from the two abstractions. Note that this abstract 
game is less than half the size of the abstract game used to compute the base 
strategy in 2011, meaning that less computer memory was required to run 
CFR. We used a parallel implementation of the External Sampling variant 
of CFR mentioned in Section 6.1, which ran for 16 days using 48 2.1 GHz 
AMD processors on a machine with 256GB of total RAM (though less than 
100GB of RAM were needed). 

7.3. Results 

The 2012 competition results [4] are presented in Table 2. Our 2012 
agent, name Hyperborean3p, won both the IRO and TBR events by significant 
margins. In addition, we compared our new agent against the previous IRO 
winners from the 2009, 2010, and 2011 competitions by running a four-agent 
round-robin competition (RRC). Table 3 presents the results averaged across 
10 RRCs. We see that not only does the 2012 agent require less computer 
memory to generate than the 2011 agent, the 2012 agent earns 13 milli-big- 
blinds per game more on average. 

Finally, all of the competition winners from 2009 to 2012 used the average 
strategy profiles generated by CFR. In light of our new insights from Section 5 
and as a final validation of our CFR modification, we reran CFR on the 2012 
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Table 2: Results of the 2012 ACPC three-player limit Hold'em events [4]. Earnings are in 
milli-big-blinds per game (mbb/g) and errors indicate 95% confidence intervals. 

Total Bankroll 



Agent 


Total Earnings 


Hyperborean3p 


28 ±5 


little. rock 


-4 ±7 


neo. poker, lab 


-11 ± 5 


sartre 


-12 ± 7 



Instant Runoff 



Agent 


Round 1 


Round 2 


Round 3 


Hyperborean3p 


37 ±5 


28 ±5 


23 ±8 


little. rock 


13 ±6 


-4 ±7 


-9 ±9 


neo. poker, lab 


7±5 


-11 ±5 


-14 ±6 


sartre 


5±7 


-12 ± 7 


Eliminated 


dcubot 


-62 ±8 


Eliminated 


Eliminated 



Table 3: Results of a four- agent RRC between the ACPC IRO three-player winners from 
2009, 2010, 2011, and 2012. Earnings arc in milli-big-blinds per game for the row player 
against the column players and errors indicate 95% confidence intervals. 





09,10 


09,11 


09,12 


10,11 


10,12 


11,12 


Overall 


2009 








-21 ±7 


-26 ±5 


-31 ±5 


-26 ± 4 


2010 




0±4 


-5 ±3 






-23 ±5 


-10 ± 4 


2011 


21 ±6 




6±5 




8±5 




11 ± 4 


2012 


31 ±5 


25 ±5 




16 ±4 






24 ± 4 



abstract game using the same CFR implementation on the same machine, 
except now saving the current profile and discarding the average. For several 
checkpoints of the original average strategy and the new current strategy, 
we played 10 RRCs versus the 2009, 2010, and 2011 ACPC IRO winners 
and plotted the results in Figure 6. While the average strategy takes 20 
days before earning 25 milli-big-blinds per game, the current strategy reaches 
better performance in just 5 days while requiring only half the memory (less 
than 50GB of RAM) to compute. 
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Days 

Figure 6: Performance over time (in days) of the average profile that won the three-player 
events of the 2012 ACPC, and of the current profile computed in the same abstract game. 
Error bars indicate 95% confidence intervals over 10 competitions versus the top 2009, 
2010, and 2011 ACPC IRQ three-player agents. 



8. Conclusion 

This paper provides the first theoretical advancements for applying CFR 
to games that are not two-player zero-sum. While previous work had demon- 
strated that CFR does not necessarily converge to a Nash equilibrium in such 
games, we have provided theoretical evidence that CFR eliminates iteratively 
strictly dominated actions and strategies. Thus, CFR provides a mechanism 
for removing iterative strict domination that was otherwise infeasible with 
previous techniques for large, non-zero-sum extensive-form games. In addi- 
tion, our theory is the first step to understanding why CFR generates well- 
performing strategies in non-zero-sum games. Though our experiments show 
that the current profile reaches a high level of performance faster than the 
average, it remains unclear whether this is due to faster removal of domina- 
tion that our theory illustrates. Nonetheless, we have shown that just using 
the current profile gives a more time and memory efficient implementation of 
CFR for games with more than two players that can lead to increased perfor- 
mance. Furthermore, we presented a new three-player limit Texas Hold'em 
agent that won both three-player events of the 2012 Annual Computer Poker 
Competition. Our agent uses a new partition of the game tree, requires less 
computer memory to generate than the 2011 winner, and outperforms the 
previous competition winners by a significant margin. 

Future work will look at finding additional properties of CFR in non- 
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zero-sum games that go beyond domination. Additionally, we would like 
to compare CFR's average and current profiles in other large, non-zero-sum 
domains outside of poker. Finally, this work has only considered the prob- 
lem of computing strategies for play against a set of unknown opponents. 
In poker and other repeated games, we often gain information about the 
opponents' strategies over time. For repeated non-zero-sum games, using 
opponent modelling to adjust one's strategy could drastically improve play. 
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Appendix A. Proofs of Technical Results 

In this appendix, we prove Proposition 1 and Theorems 2, 3, and 4. For 
7 G Xj, define 

D(I) = {I' eli\3he I,ti G I' such that h C h'} 

to be the set of information sets descending from 7. 

Proposition 1. If a is a weakly dominated action at I G Xj and o~i G Sj 

satisfies irf(I)ai(I,a) > 0, then o~i is a weakly dominated strategy. 

Proof. Since a is weakly dominated, there exists a strategy a[ G Sj such that 
Vi(I,o-(i^. a )) < Vi(I, (abcr-i)) for all opponent profiles cr_j G and there 
exists an opponent profile ct^ such that Vi(I, (cr^i^a), cr'-i)) < Vi(I, (a^, cr^J). 
Let a, be the strategy except at 7, where &i(I,a) = and &i(I,b) = 
<Ti(I,b)l(\ - <Ti(I,a)) for all b G A(I), b ^ a. Next, for all J G X and 
6 G A(J), define 

7rf(J)(<7 i (/,o)7rf'(J,J)cr<(J,6)+(l-«r i (/,o))7rf(/,J)*i(J,6)) if J G 7J>(7) 

'V t ,x _ J ^(/((^(/^l^'f/.Jl+ti-ff^^DTrft/.j)) (and arbitrary when the 

"* ! denominator is zero), 

^(J,6) if J i D(I). 

One can verify that a" G Sj is a valid strategy for player i. Now, fix o"_j G 
S_j. Then, 

= <(/) ^ ^(7,6)^(7,(7(^6)) + ^ 7r CT (zH(z) 

66A(7) z^Zj 

<<(7) ( 7 i (7,a)^(7,( ( r>_ i )) 

+ <(-0(! - <7i(I,a)) d 'i( / ' 6 ) u <( / '(^U-^)' <7 -<)) 
6eA(J) 

+ ^ 7T*(*)lli(*) 
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= *t(l)vi(I, K,a_,))+E^HW 

Thus, Ui(ai,a-i) < Ui(cr",cr-i) for all o"_j e £_,. A similar argument shows 
that Ui(<7j, aLj) < v>i(cr", cLJ, proving that <7j is weakly dominated by erf. □ 

Next, we prove Theorem 2, using the fact that new iterative dominances 
only arise from removing actions and never from removing mixed strategies 
[16]: 

Theorem 2. Let a 1 , a 2 , ... be a sequence of strategy profiles in a normal-form 
game where all players ' strategies are computed by regret minimization algo- 
rithms where for all i E N , a E A i} if Rj{a) < and Rf(a) < max fe6j 4. Rj(b), 
thenaf +1 {a) =0. If o~i is an iteratively strictly dominated strategy, then there 
exists an integer T such that for all T > T , supp(o"i) ^ supp(crf). 

Proof. Let a 1 , a 2 , a k be iteratively strictly dominated actions (pure strate- 
gies) for players j±, ji-, ■■■■,3k respectively that once removed in sequence yields 
strict domination of a { . Let B_i = A_ i \{a 1 , a 2 , a k } be the set of opponent 
actions other than a 1 , a 2 , a k . Next, by iterative strict domination of <7j 
and because the game is finite, there exists another strategy a\ e Sj such 
that 

e= min Ui(cr-, a_j) - Ui(a h a_i) > 0, 
so that Ui(ai,a-i) < u^a^a-i) — e for all a_, G Then, 
E a^Rfia) = °i{o)Rj{a) ~ E ^(°)^( a ) + E ^(°)^( a ) 

aeAi aeAi aeAi aeAi 

T 

= E - a ^ a )) E M a ' - M ^ a< )) 

aeAi t=l 

+ E^(°) i? n«) 

T 

= e - *K oU)) + E °i( a )^( a ) 

t=l aeAi 

supp(crl i )^B_ i 
l<t<T 
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+ y - u i( a 'i> a -i)) + Y ^( a ) R f( a ) 

supp(a t _ i )CB_ i a&Ai 
l<t<T 

Y { u i(°ii°U) -u%Wi,°-ii) 

supp( C r*_ i )^B_ i 
l<t<T 

+ Y Y M<?i, a-%) - Uifr'i, a-i)) 

supp(o- i _ i )CB_ l a_i£-B-i 
l<t<T 

+ Yj a 'i( a )Ri ( a )> where cr_j(a_j) = J^cr^a,-) 
< Y {ui{a h ati) -Uiia'^ati)) 

l<t<T 

+ Y (-e) + max J Rf(a). (A.l) 

supp(o-l i )CB_ I 
l<t<T 

We claim that there exists an integer T such that for all T > T , there 
exists a G supp(<x;) such that Rf(a) < and Rf(a) < maxfeg^ Rf (b). By 
our assumption, this implies that for all T > T , there exists an action 
a G supp(crj) such that a ^ supp(af ), establishing the theorem. 

To complete the proof, it remains to establish the claim, which we prove 
by strong induction on k. For the base case k = 0, we have B_i = A_i, and 
so by equation (A.l) we have 

min Rf(a) < Y^( a ) R K a ) 

aesupp(CTi) — * 
aSA; 

< -eT + max Rf {a) (A.2) 

< -eT + Rj' + . 

Dividing both sides by T and taking the limit superior gives 

1 R T ' + 
limsup— min Rj(a) < — e + limsup 1 

T^oo T agsupp(o-i) T^oo T 

= — e 
< 0. 
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Thus, there exists an integer T such that for all T > T , Rj(a*) < where 
a* = argmin aesU pp( CTi ) Rf (a). Also, by equation (A. 2), Rf(a*) < —eT + 
max agj 4 t Rj{a) < max a6 Aj Rj(a), completing the base case. 

For the induction step, we may assume that there exist integers Xi, ...,T k 
such that for all 1 < £ < k, T > T e , Rj e {a e ) < and 7?J(a £ ) < 
max b( z Aj( Rj e (b). This means that for all T > Tq = maxjTx, T k }, 
a e <£ supp(crj) for all 1 < t < k. Hence, supp(cr^) C for all T > Tq. 
Therefore, again setting a* = a,Tgmm aesupp ^ Rf(a), by equation (A.l) we 
have 

l<t<T 

+ E (-e)+ maxima) 

supp(cr*_ i )C_B_j 
l<t<T 

< TqAj — e(T — Tq) + maxRj (a), where Aj = max Wj(a) — Mj(a') 

aeAi a,a'£A 

(A.3) 

< T^ - e(T - T^) + itf + . 

Dividing both sides by T and taking the limit superior gives 

Rf(a*) , /T'A, e(T-T') i? T '+\ 

limsup < limsup - 1 oJ + 

= — e 
< 0. 

Thus, there exists an integer T such that for all T > T , TpAj < e(T - Tq) 
and Rf(a*) < 0. By equation (A.3), this also means that for T > T , 
Rf(a*) < max a£j 4. Rj(a), completing the induction step. This establishes 
the claim and completes the proof. □ 

Before proving Theorems 3 and 4, we need an additional lemma. For 
(Tj G Sj and 7 G Xj, define the full counterfactual regret for cr, at 7 to be 

T 

^fuiiU, = ^2(v t (I, (a h o*_i)) - Vi(I, a')). 
t=i 
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We begin by relating full counterfactual regret to a sum over cumulative 
counterfactual regrets. This step was part of the original CFR analysis [33], 
but we relate these terms here in a slightly different form. For 7, 7' G Zj, 
ft G 7, ft' G 7', and 0$ G define irf (I, I') = 7Tj(ft, ft/), which is well-defined 
due to perfect recall. 

Lemma 1. 

^fun(^)= E <( J ' J/ ) E ^>r t (/». 

J'eD(/) aeA(j') 

Proof. We prove the lemma by strong induction on \D(I)\. For 7 G X« and 
a G -A(J), define 

5(7, a) = {/' £ li \3h £ I,h' £ I' where fta C ft' 

and ^ft" G 77; where fta C ft" C ft'} 

to be the set of all possible successor information sets for player % after taking 
action a at 7. In addition, define Z(7, a) to be the set of terminal histories 
where the last action taken by player i was a at 7. To begin, 

T T 

RlUi, a) = E " E ^ J > 

*=i *=i 

T T 

= E E cr i( / ' a )^( / '( (7 i(-f^»)' (j -i)) - E^ 7 ' 0- *) 

t=l aeA(/) t=l 

= e ( E 

a6A(/) t=i y 2 ez(/,a) 

/'eS(/,o) / t=i 

For the base case D(I) = {I}, we have 5(7, a) = and Z(J, a) = Zj, and so 
the right hand side of equation (A.4) reduces to ^2, aeA ,^ai{I ,a)Rj{I ,a) as 
desired. For the induction step, note that 1-0(7') | < |T)(7)| for all 7' G 5(7, a), 
and so we may apply the induction hypothesis to get, for all 7' G 5(7, a), 

T T 

<=i <=i 
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i"eD(i') beA(i") 

+ f>(/V). 



t=l 

Finally, substituting into equation (A. 4), we have 

T 



aeA(J) 



E E 

*=1 zeZ(I,a) 



+ E E <( J/ ' J// ) E ^(OR T (0) 

I'eS(I,a) \I"eD(I>) b&A(I") 



+E^ 7 >') 



(=1 



t=l 

T 



= e ^(ME^M^) -E^ 7 ^*) 

aeA(/) t=l t=l 

+ E ^( 7 >«) E ( E <( J/ ' J// ) E 

oeA(/) res(i,a) \i"eD(i') beA(i") 

= E Vi(I,a)Rf(I,a) 

a£A(I) 

+ E <( J ' J/ ) E 

i'eD(i) beA{i') 
= E <( 7 ' 7 ') E 

/'eD(l) aeA(I') 

completing the proof. □ 
Corollary 1. 

RlUl,^)<^\D(I)W\Afr)\T. 

Proof. By Lemma 1, 

I'eD(I) aeA(I') 
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< V max Rj' + (l',a) 
i'eD(i) 

< \D(I)\A tV f\A(Tj\T 

by equation (4). □ 

Theorem 3. Let a 1 , a 2 ,... be strategy profiles generated by CFR in an 
extensive-form game, let I G X i7 and let a be an iteratively strictly domi- 
nated action at I , where removal in sequence of the iteratively strictly dom- 
inated actions oi, ...,ak at Ii, ...,Ik respectively yields iterative dominance of 
ak+i = a. If for 1 < i < k + 1, there exist real numbers b~t^i > and an 
integer Ti such that for all T > T e , \Hs e (Ig) n {a 1 \ < t < T}\ > 7^T ; then 

(i) there exists an integer T such that for all T > T , Rf(I, a) < 0, 

(ii) if liniT-s>oo x T /T = 0, then limr^oo y T (I, °)/T = 0, where y T (I,a) is 
the number of iterations 1 < t < T satisfying a l {I,a) > 0, and 

(iii) if Hindoo x T /T = 0, then Hindoo ixf (I)aJ {I, a) = 0. 

Proof. We will first prove parts (i) and (ii) by strong induction on k, followed 
by proving (iii) from (ii). For 5 > 0, let S^J) = {a G ^s(I) \ o-{I^,a^) = 
0, 1 < I < k} be the set of strategies in that do not play a±, at- By 

iterative strict domination of a, there exists o\ G Sj such that Vi(I, cr^^ a )) < 
v (a-, (T_j)) for all a G Xo(-0- Next, let 5 = 5k+i and 7 = 7^+1 . Then, 
since £5(1) is a closed and bounded set and fj(J, •) is continuous, by the 
Balzano-Weierstrass Theorem there exists an e > such that Vi(I, o"(/^ a )) < 
Vi(I, (a-, (j-i)) - e for all o G £<$(/). Then, 

Rf(I, a) = Rj(I, a) - ^ full (J, a[) + F^JI, a',) 

T 

= E M 7 ' 4-a)) - ^( J ' W» ^))) + ^full( J ' *i) 
t=l 

T' — 1 

t=l 

To<t<T 

(7*gEo(J) 
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+ Hl^l^-Viil^a'^aU))) 

T£<t<T 
T^<t<T 

a*eto(i)\t t (i) 

(A.5) 

For the base case = 0, we have t^o(I) = X and = S 5 (7). Choose T 

to be any integer greater than max{Tg, A 2 |T>(i")| 2 |A(Xj)|/e 2 7 2 } so that for all 
T>T , 

T' — 1 



To<t<T 
T^<t<T 

< -e|E,(7) n {a* | T < t < T}| + it£ full (7, ^) 



< _ e7 T + A. t | J D(/)| v /|A(X,)|Tby Corollary 1 

< 

by choice of T . This establishes part (i) of the base case. For part (ii), 
since CFR applies regret matching at I, by equation (5) it follows that for 
all T > T , either £ beA(/) Rj' + (I, b) = or aj +1 (l, a) = 0. Thus, 

y T (I,a) = y^(/,a) + (y r (/,q)-y r °(/,a)) 

< lim ?!M±f! 

= 0. 

Thus, (ii) holds and we have established the base case of our induction. 

For the induction step, we now assume that parts (i) and (ii) hold for 
all ai, ...,afc. We will show that there exists an integer T such that for all 
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T > T , Rf(I,a) < 0. This will establish part (i), and part (ii) will then 
follow as before to complete the induction step. 
Firstly, note that 

E ^{iy^-v^M^U))) <o 

To<t<T 

a*eto(i)\t s (i) 
by iterative domination of a. Secondly, 

E HM'-)) - ^( 7 >>-*))) 

To<t<T 

< -e\± s (I) n {a* | T < t < T}\ 

= -e (|E 5 (7) n {a 1 | T < t < T}\ - |(E 5 (7)\E (5 (7)) n {a* | T < t < T}|) 

< -e 7 T + e ^|/ T (/,,a,). 

£=1 

Thirdly, 

E MM'-*)) -^.W.^-i))) ^ A^y T (7,,a,). 

T^<t<T 1=1 

Thus, substituting these three inequalities and Corollary 1 into equation 
(A. 5) gives 

T' — 1 

i?f(7,a) < E MM'-)) -Vi{I,K°U))) 
t=i 

k k 

+ E /(^, a,) - e 7 T + e E /(^ «/) + A*|T>(7) | Vl^lT. 

£=1 £=1 

Dividing both sides by T and taking the limit superior gives 

T' — 1 

i? T (7 a) 1 
lim sup - — < E M 7 > ^(/-^a)) - Vii 1 ' (°i lim sup - 
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+ (Aj + e) ^limsup f^h^l - e7 + A t \D(I)\^\A(T^\im 



,T( 

T^oo T T- 



e=i 
= -67 
< 

by applying part (ii) of the induction hypothesis. Therefore, there exists an 
integer T such that for all T > T , Rf(I,a)/T < and thus Iff (I, a) < 0, 
completing the induction step. 

Parts (i) and (ii) are now proven. It remains to prove (iii). To that end, 

lim nf(I)af(I,a) = lim (± 

T-^oo T 

< lim « 
= 

by part (ii). Since irf T (I)af (I, a) is nonnegative, it follows that 
liniT->oo 7rf (I)af(I,a) = 0, completing the proof. □ 

Theorem 4. Lei o" 1 ,^ 2 ,... fre strategy profiles generated by CFR in an 
extensive-form game, and let o~i be an iteratively strictly dominated strategy. 
Then, 

(i) there exists an integer Tq such that for all T > Tq, there exist I G I, 
a G A(I) such that ir? (I)ai(I , a) > and Rf(I,a) < 0, and 

(ii) if lim-r^oo x T /T = 0, then lim^oo y T (<Tj)/T = 0, where y T (ci) is the 
number of iterations 1 < t < T satisfying supp(ai) C supp(a\). 

Proof. Let s^, s| 2 , Sj k be iteratively strictly dominated pure strategies 
that once removed in sequence yields strict domination of <7j. Let S_i = 
«S_j\{sj 1 , s 2 2 , s*j k } be the set of opponent pure strategy profiles that do 
not play any of s] i? s| 2 , s^ fe . Next, by iterative strict domination of Oi and 
because the game is finite, there exists another strategy a\ G such that 

e= min Ui(cr-, s_,) - t/i(<7i, S-j) > 0, 
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so that Mj(<7j, s_j) < iij((T^, s_j) — e for all s_j G 

For o-i G Sj, define i?£ fuU (<7j) = J2t=i {uii^h^U) - u A a %- Note triat 

^uii(^) = y^-^ouii^'^)' 

/ex; 

where Xj = {/ G Xj | V/i G /, h! C /i, -P(/i') 7^ is the set of all possible 
first information sets for player % reached. So, by Corollary 1, Rjf u n(<7i) < 
Ai|Xj|-y/|A(Xj)|T for all &i G Sj. Then by Lemma 1, we have 

/ex l aeA(7) 



T 



53 53 vUis-JMvi'S-i) -uM'i,S-i)) 



SUpp(a t _ i )CS-i S-iG5_i 
Kt<T 



supp^^S-i 
KKT 



where cr_;(s_i) = J J (Tj(I,Sj(I)) 



lex. 



<-e(T -E/(4) + A *5> T (4) + A,|X| v^p. (A.6) 



i=\ / £=1 

We claim that 

limsup \ E Trf (/) 53 ^( J > a )^, a) < 0. 

/e:I » aeA(J) 

Assuming the claim holds, because (1/T), and <7j(/,a) are nonnega- 

tive, it follows that there exists an integer T such that for all T > T , there 
exist / G Xj, a G A(I) such that 7rf (J)<7j(J, a) > and Rf(I,a) < 0, estab- 
lishing (i). For part (ii), note that part (i) and equation (5) imply that for 
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all T > T , either £ beA(/) Rj '+(/, b) = or supp(^) £ supp(af). Thus, 

lim = lim gMlWi^ 

~ T->oc T 

= 0, 

establishing part (ii). 

To complete the proof, it remains to prove the claim, which we will prove 
by induction on k. For the base case k — 0, equation (A. 6) gives 

hmsup I £ <(/) V a )i2f(J, a) < limsup -6 + ^g^Sl 



^ aeA(i) 

= — e 
< 0. 

For the induction step, we may assume that parts (i) and (ii) hold for all 
s^, Sj 2 , . Then equation (A. 6) implies 

limsup ^(/, a)i^(7, a) < -c + (c + A*)^ limsup 

-feXi aeA(7) ^=1 

+ lim sup 



= — e 
<0, 

proving the claim. □ 
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